Random walks with badly approximable numbers 

Doug Hensley and Francis Edward Su 

Abstract. Using the discrepancy metric, we analyze the rate of convergence 
of a random walk on the circle generated by d rotations, and establish sharp 
rates that show that badly approximable d-tuplcs in TV^ give rise to walks with 
the fastest convergence. 



1. Introduction 

Fix a — (ai,Q!2, ...,ad), a d-tuple of real numbers, not all rational. Consider 
a random walk on the circle which proceeds as follows: at each step one of 
the ai is chosen (with probability 1/d) and the walk moves forwards or backwards 
(with probability 1/2) on the circle by an angle 27rai. The distribution of this walk 
converges to the uniform (Haar) measure on S^; in this paper, we investigate how 
quickly this convergence can take place. 

To quantify this, identify with the unit circle in R^, and let Q be the prob- 
ability measure supported on {e^^'^*"^}, j = l,...,c?, with equal weights l/2d. If 
at time fc = the random walk starts at 1, then the convolution power Q*^ repre- 
sents the probability distribution of the random walk at time k. In this framework, 
we study how quickly Q*'' approaches U, the uniform (Haar) measure on the unit 
circle. Define the discrepancy D{P) of a probability measure on by 

(1) i^(P)=sup|F(/)-[/(/)| 

where / represents any connected interval on . The discrepancy D{P) measures 
how uniform P is, and metrizes weak-* convergence to the uniform distribution on 
S^. The usual definition of discrepancy found in the study of uniform distribution 
of sequences mod 1 (see §) is a special case of our definition, by letting P be an 
equally weighted measure on a sequence. 

We seek bounds on how quickly D{Q*^) diminishes as a function of the step 
size k and the numbers a^. Clearly the convergence will not occur if all Ui are 
rational. 

The case d = \ reduces to a random walk generated by one irrational rotation; 
this problem was posed by Diaconis in ||^, p. 34]. Su [12| obtained discrepancy 



bounds for this walk, showing that it converges most quickly when a is a quadratic 
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irrational. In that case, 



for constants Ci, C2 which can be determined given a. We investigate whether the 
inclusion of additional generators can speed up the convergence of this walk. In 
particular we show that if a is a badly approximable vector in R**, then the random 
walk generated by the a converges as fast as possible for a set of d generators. In 
Theorem ^ we obtain matching upper and lower bounds: 



We also show how the constants depend on a. 

This result is reminiscent of results in the theory of uniform distribution of 
sequences mod 1; however, there is a notable lack of any log terms in the upper 
and lower bounds. To compare for d = 1, it is known (e.g., see ||^) that for a fixed 
irrational a G R the discrepancy of the sequence {a, 2a, 3a, . . . , ka} in R mod 1 
falls as logfc/fc up to constant factors. For our corresponding random walk, it is 
not a surprise that the exponent gets halved, but it is quite surprising that the log 
term disappears. 

While the total variation metric is more frequently used to study random walk 
convergence (e.g., see j^), it is not appropriate for this walk, because Q**^ is a 
finitely supported measure and its total variation distance from U remains at 1 for 
all k. Moreover, we favor the use of discrepancy over other common metrics on 
probabilities (e.g., Wasserstein, Prokhorov) because bounding techniques for such 
metrics are not well developed, and the use of Fourier bounds for the discrepancy 
metric admits the possibility of sharp analysis. An analogous discrepancy metric 



may be used to study random walks on other spaces, e.g., |13| , [14| . For a 
survey of bounds relating various metrics, see 

We also remark that while the literature contains many results on rates of 
convergence for random walks on finite groups (e.g., see and a few results for 
walks on infinite compact groups ([^,[§), very little has been done for discrete 
walks on infinite compact groups, mainly due to lack of bounding techniques for 
appropriate metrics. Our analysis of the d-generator random walk on the circle 
reveals that bounds for the discrepancy metric are refined enough to yield sharp 



rates of convergence; this and [12] are the only sharp results we are aware of in this 
direction. 



2. Random Walk Bounds 

Throughout this paper, let ||x|| denote the Euclidean distance of x e R"^ to 
the nearest integer lattice point (the dimension inherent in the notation || • || is to 
be understood by context). Given an arbitrary d-tuple a = (ai, a2, a^), the 
Dirichlet Approximation Theorem p. 68] implies that there exists a constant 
B = B{a) such that for any q, there exists a positive integer N < q"^ such that 

(2) ll^^ll < f ^ 

In fact for any d-tuple a, the above bound holds for B — \/d. 
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The d-tuple a is said to be badly approximable if there exists a constant P = 
P{a) > such that for aU positive integers N, 

(3) ll^^^ll > 

We refer to B and /3 as approximation constants for a. The approximation 
constant f3 is only defined for badly approximable a. 
Our main result is the following: 

Theorem 1. Suppose a = {ai,a2, *s badly approximable. Let Q denote 

the generating measure of the random walk on generated by the d generators q,; . 
Then the discrepancy of the k-th step probability distribution of the walk satisfies 

(4) tS. < D[Q*^) < 



— \^ J — 

with constants that depend on d and a: 

C'l = 0.0947 {Vd/5Bf 

C2 = 19.857 {dVd/pf 

where B,f3 are approximation constants for d. The lower bound holds for walks 
generated by an arbitrary d-tuple. 

Therefore, among all d-tuples, walks generated by badly approximable d-tuples 
converge the fastest. Note that the constants Ci, C2 depend on the approximation 
constants of d. However since one may choose B = Vd, it follows that Ci > 
0.0947(1/5)'*. 



Proof. We establish the lower bound using an inequality of Su |13| for the 
discrepancy of an arbitrary probability measure P on S^: 

1/2 

(5) D{P) > 




Here P{m) denotes the m-th Fourier coefhcient of P (viewed on R mod Z). Since 
every term is positive, we can use a dominant term in the sum as a lower bound. 
The Fourier coefficients for the d-generator walk are 

^ 1 1 

(6) Q(to) = y"_(e2'^™"' _^g-2^»maA = ^ cos(27rmai ) . 

^-^ 2d d ^-^ 

Since cos(27ra;) — cos(27r||x||) > 1 — 27r^||a;|p, for any m we have 

(7) g(m)>i_ ||™a||2 

d 

The inequality (1 — x)^ > 1 — fca; holds for fc > 1 and a; < 1. Then 

|g(m)|'=>l-^||maf 

as long as 27r^fc||ma|p/(i < 1. We ensure this by setting Z\ < \ (to be specified 
shortly) and let q — (^Tv^B'^kjZxd)^!^ . Then (0) implies there exists an integer 
m < such that ||ma|| < S/g, which yields 2'irk\\rad\^ j d < Zi < 1, as desired. 
(Note that q was chosen to ensure \Q{m)\'' > I — Zi.) 
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For this m we use just the m-th term in (||), and since m < q'^ and Q*^{m) 
Q''{m), we obtain 

(8) DiQ*'') > > c, k-^l^ 

nm 

with 

V2(l - Zi) / Zid ^ '^''^ 



Choosing Z\ ~ 1iv^ jlh < 1, we recover the lower bound (Q) of the theorem. 

The upper bound is trickier to compute. We start with the Erdos-Turan in- 
equahty Q: given a probabihty measure P on S^, for any integer M, 

M + 1 TT ^-^ m 

rn — l 

where P represents the Fourier transform of P. Note that one may choose M in 
the Erdos-Turan inequahty so as to optimize the bound obtained. 

Since | cos(27rx)| < 1 - 4||2a;|p for all a; e R, it follows from (0) that 



1 

\Q{m)\ < - I cos(27rTOa;)| 
1=1 

4 

- 1- ^IIl|2^""'ll' 



(=1 
4, 



< exp(— -^||2TOa||^). 
a 

In light of the Erdos-Turan inequality and Q*^{m) — Q^{m)^ we need to esti- 
mate a sum of the form 



< -exp(- — ||2™a|p) =: S. 



— exn I — 

d 

Since M may be chosen freely, choose an integer M such that 



(10) M <^{/3^k/d^f^^ < M + 1. 

Recall that P is an approximability constant for a from (j^) and k is the number of 
steps in the walk. The reason for this choice of M will be evident later. Choose an 
integer J such that 

(11) 2'^"^ < M < 2'^ - 1. 

The sum in S may be grouped into J cohorts of integers m e [2^^^, 2^ — 1] 
for j — 1, J. Within each such cohort, the use of inequality (||) yields ||2ma|| > 
/3/(2m)i/'' > /?/2(-'+i)/''. This says that the points of the sequence {2ma} (mod 
1) in the unit cube in R'' are bounded away from the corners of the unit cube. In 
fact, they are bounded away from each other, since if rrii, m2 G [2-'^^, 2^ — 1], then 

(12) |l2(mi -m2)a|| > /3/2(^'+i)/'' 
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as well. Therefore if s = ^ j \fd2'^3+^) I <i ^ then any box of side length s can contain 
at most one of the multiples from the sequence {2ma} (mod 1), m e [2^^^, V — 1], 
with no such multiples in boxes containing the origin. 

So divide up the unit d-cube into disjoint boxes of side length s and sides 
parallel to the axes. In the worst case, all M points are distributed in the boxes 
in the 2^^ corners of the unit cube nearest the origin. The nearest points in such 
boxes (under the h\ metric) are at integral multiples of s and extend out in layers 
to distance at most M^l'^ . A crude upper bound for the number of boxes whose 
nearest point is at Li-distance ns from the origin is (n + l)'^"-'^, and in the Euclidean 
metric the nearest point in such boxes is at least distance nsjyfd from the origin. 
Hence we can bound S by grouping first by cohorts, then by corners and layers: 

1 , 4fc, 



^ E E -exp(--|l2ma|i^) 



d 



m 

j=l m=2J-i 



< 



j=l n=l 



J 



^ T.^T.(^+ 1)'"' exp(-4n2rf22(^-^-i)/'^). 

j=l n=l 

The second inequality used the bound ||2mQ!|| > nsj \fd and the definition of s. The 
third inequality follows by noting fc > S"]?'^ I j 0^ from the definitions of J and M 
in (|l^) and ([l|). 

Using i < J, the log derivative of the expression of the innermost sum with 
respect to n can be bounded: 



-d< -1 



n+l ~ 71 + 1 

for all choices of n > 1 and d > I. Hence the expression in the inner sum decreases 
geometrically (by at least ratio e^^) as j increases, and so the inner sum is bounded 
by the first term (at n — 1) times the constant Z2 = 1/(1 — e~^) « 1.5820. Thus 

./ 

5 < ^ 2^^-^ exp(-4(i22(^-J-i)/''). 

This sum may be bounded by noting that the largest term occurs when j — J. For 
j < J, the log derivative of the terms with respect to j is ln2(— 1 + 8-2^'^ •^"■'~^^/'') > 
In 2 for d > 1. Thus the sum decreases geometrically (with at least ratio 1/2) as 
j < J decreases, so the sum is therefore bounded by twice the final term at j — J: 

S < 2Z2 22'^~"'exp(-4d2~2/rf) 

, , .Z22^''+i exp(-4d2-2/'i) 4.6559 

13 < — < 

^ ' - M +1 - M +1 

where the second inequality follows from (|l^), and the final inequality from noting 
that the numerator is greatest for d — I. Using the Erdos-Turan inequality we 
obtain the upper bound 

i)(Q-) < ^ + 15 < i±il/^li^ < ^ 

' - M+1 TT - M + 1 - M + 1 
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for all d > 1. An application of ( |ic| ) produces 

D{Q*'') < 19.857 d3'*/2/3-'i/j-rf/2 

which establishes the constant C2 in the statement of the theorem. □ 



We remark that this argument simplifies the proof given in |12| for the case 
d = I. For specific d, the constants Ci,C2 can be improved by adjusting the 
derivations of the constants Zi , Z2 and the bound for the last inequality in (^3|) . 

3. Choosing Generators 

Badly approximable c?-tuples are plentiful. Cassels shows that there un- 
countably many badly approximable d-tuples in R**. Moreover, results of Schmidt 
|1C)| , pp. 53-59] imply that the Hausdorff dimension of the set of badly approximable 
vectors is positive. For some concrete examples, it can be shown (see, e.g., p. 
68]) that if l,ai, ...,0;^ are linearly independent over Z, and if the degree of the 
extension [Q(q!i, ad) : Q] = d + 1, then a is badly approximable. 

We have shown that badly approximable d-tuples in R'^ give rise to random 
walks with the fastest convergence, established sharp rates of convergence, and 
exhibited how the constants depend on the approximation constants /?, B of the 
given d-tuple. However, just among badly approximable d-tuples, can we say which 
random walks converge "the fastest" ? By Theorem ^, this amounts to identifying 
d-tuples with the largest possible approximation constant /3. 

For d = 1, the number cj) = satisfies (||) and is therefore badly approx- 

imable; in fact, it is known to be the "most" badly approximable number in the 
sense that its approximation constant /? is larger than for any other number. 

For d = 2, we conjecture that the "most" badly approximable 2-tuple is the 
vector V = (7^^, 7~^), where 7 « 1.3247 is the unique real root of x'^ — x — 1. For 
this V, we believe (based on heuristic arguments and numerical evidence) that for 
any jS < 0.54850..., there is a sufficiently large N such that n > N implies that 

(14) Ijnvll > 13 n'^l'^. 

Furthermore, no other vector can have a much larger approximation constant /3, 
because the work of Davenport and Mahler implies that if /9 > (2/'\/23)^^^ ~ 
.64577..., then there is no vector v in R^ for which ( |l^ holds for all n. 

Thus the pair (7~^,7~^) yields a random walk on the circle with 2 generators 
whose convergence rate appears close to fastest possible over all badly approximable 
pairs. Is it fastest? For larger d the question is also open. 
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